Wanted: Floating-Point Add Round-off Error instruction

نویسندگان

  • Marat Dukhan
  • Richard W. Vuduc
  • E. Jason Riedy
چکیده

We propose a new instruction (FPADDRE) that computes the round-off error in floating-point addition. We explain how this instruction benefits high-precision arithmetic operations in applications where double precision is not sufficient. Performance estimates on Intel Haswell, Intel Skylake, and AMD Steamroller processors, as well as Intel Knights Corner co-processor, demonstrate that such an instruction would improve the latency of double-double addition by up to 55% and increase double-double addition throughput by up to 103%, with smaller, but non-negligible benefits for doubledouble multiplication. The new instruction delivers up to 2× speedups on three benchmarks that use high-precision floating-point arithmetic: double-double matrix-matrix multiplication, compensated dot product, and polynomial evaluation via the compensated Horner scheme.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error bounds on complex floating-point multiplication with an FMA

The accuracy analysis of complex floating-point multiplication done by Brent, Percival, and Zimmermann [Math. Comp., 76:1469–1481, 2007] is extended to the case where a fused multiply-add (FMA) operation is available. Considering floating-point arithmetic with rounding to nearest and unit roundoff u, we show that their bound √ 5u on the normwise relative error |ẑ/z − 1| of a complex product z c...

متن کامل

Accurate Floating-Point Summation Part II: Sign, K-Fold Faithful and Rounding to Nearest

In this Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm for calculating the rounded-to-nearest result of s := ∑ pi for a given vector of floatingpoint numbers pi, as well as algorithms for directed rounding. A special algorithm for computing the sign of s is given, also working for huge dimensions...

متن کامل

More Instruction Level Parallelism Explains the Actual Efficiency of Compensated Algorithms

The compensated Horner algorithm and the Horner algorithm with double-double arithmetic improve the accuracy of polynomial evaluation in IEEE-754 floating point arithmetic. Both yield a polynomial evaluation as accurate as if it was computed with the classic Horner algorithm in twice the working precision. Both algorithms also share the same low-level computation of the floating point rounding ...

متن کامل

Reducing Round-off Error in an ADSL Modem with Block-Floating-Point Structure

In this paper, we report an issue first observed when using block-floating-point calculations in a time equalizer of an ADSL modem. We believe this important phenomenon have been out of the sight of researchers till now; and show that neglecting this issue may increase the round-off error significantly and reduce the SNR of a system. The simulation results suggest that ignoring this issue may i...

متن کامل

What Every Computer Scientist Should Know About Floating-Point Arithmetic

Floating-point arithmetic is considered an esoteric subject by many people. This is rather surprising because floating-point is ubiquitous in computer systems. Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every op...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1603.00491  شماره 

صفحات  -

تاریخ انتشار 2016